ARIES: A lexical platform for engineering Spanish processing tools

نویسندگان

  • José Miguel Goñi-Menoyo
  • José Carlos González
  • Antonio Moreno
چکیده

We present a lexical platform t h a t has been developed for the Spanish language. It achieves portabi l i ty between different computer systems and efficiency, in t e rms of lexical coverage. A model for t he full t r e a tmen t of Spanish inflectional morphology for verbs, nouns and adjectives is presented. This model permi t s word formation based solely on morpheme concatenat ion, driven by a feature-based unification g rammar . The runt ime lexicon is a collection of al lomorphs for b o t h s tems and endings. Al though not tested, it should be suitable also for other highly inflected languages, such as R o m a n ones. A formalism is also described for encoding a lemma-based lexical source, well suited for expressing linguistic generalizations: inheri tance classes, l emma encoding, morpho-graphemic al lomorphy rules and l imited type-checking. From this source base, we can automat ical ly generate an al lomorph indexed dict ionary adequa te for efficient retrieval and processing. A set of software tools has been implemented a round this formalism: lexical base augment ing aids, lexical compilers to build runt ime dictionaries and access libraries for t hem, feature manipula t ion libraries, unification and pseudo-unification modules , morphological processors, a parsing system, etc. Software interfaces among the different modules and tools are cleanly defined to ease software integrat ion and tool combinat ion in a flexible way. Directions for accessing our e-mail and web demons t ra t ion pro to types are also provided. This work has been suppor ted in par t by t he Spanish Plan Nacional de I+D, t h rough the project TIC91-0217C02-01, ARIES: An Architecture for Natural Language InterfacES with User Modelling. Some figures are given also, showing the lexical coverage of our platform compared to some popular spelling checkers.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A framework for lexical representation

In this paper we present a unification-based lexical platform designed for highly inflected languages (like Roman ones). A formalism is proposed for encoding a lemma-based lexical source, well suited for linguistic generalizations. From this source, we automatically generate an allomorph indexed dictionary, adequate for efficient processing. A set of software tools have been implemented around ...

متن کامل

CASSAurus: A Resource of Simpler Spanish Synonyms

In this work we introduce and describe a language resource composed of lists of simpler synonyms for Spanish. The synonyms are divided in different senses taken from the Spanish OpenThesaurus, where context disambiguation was performed by using statistical information from the Web and Google Books Ngrams. This resource is freely available online and can be used for different NLP tasks such as l...

متن کامل

DAFOE: a Platform for Building Ontologies from Texts

Although text-based ontology engineering gained much popularity in the last 10 years, very few ontology engineering platforms exploit the full potential of the connection between texts and ontologies. We propose DAFOE, a new platform for building ontologies with a terminological component using different types of linguistic entries (text corpora, results of natural language processing tools, te...

متن کامل

Field Programmable Gate Array–based Implementation of an Improved Algorithm for Objects Distance Measurement (TECHNICAL NOTE)

In this work, the design of a low-cost, field programmable gate array (FPGA)-based digital hardware platform that implements image processing algorithms for real-time distance measurement is presented. Using embedded development kit (EDK) tools from Xilinx, the system is developed on a spartan3 / xc3s400, one of the common and low cost field programmable gate arrays from the Xilinx Spartan fami...

متن کامل

Evaluating the LIHLA lexical aligner on Spanish, Brazilian Portuguese and Basque parallel texts

Alignment of words and multiword units plays an important role in many natural language processing applications, such as example-based machine translation, transfer rule learning for machine translation, bilingual lexicography, word sense disambiguation, etc. In this paper we describe LIHLA, a lexical aligner which uses bilingual probabilistic lexicons generated by a freely available set of too...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Natural Language Engineering

دوره 3  شماره 

صفحات  -

تاریخ انتشار 1997